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Abstract 



11 An outstanding question in comparative genomics is the evolutionary importance of gene expression dif- 

Q^' 12 ferences between species. Rigorous molecular-evolution methods to infer evidence for natural selection 

13 from transcriptional profiling data are at a premium in the field, and to date, phylogenetic approaches 

^ ' 14 have not been well-suited to address the question in the small sets of taxa profiled in standard surveys 

QO I 15 of gene expression. To meet this challenge, we have developed a strategy to infer evolutionary histories 

'/^ ' 16 from expression data by analyzing suites of genes of common function. In a manner conceptually similar 

f^ I 17 to molecular-evolution models in which the evolutionary rates of DNA sequence at multiple loci follow a 

18 gamma distribution, we modeled expression of the genes of an a priori- defined pathway with rates drawn 

19 from an inverse-gamma distribution. We then developed a fitting strategy to infer the parameters of this 

20 distribution from expression measurements, and to identify gene groups whose expression patterns were 

21 consistent with evolutionary constraint or rapid evolution in particular species. Simulations confirmed 

22 the power and accuracy of our inference method. As an experimental testbed for our approach, we gen- 

23 erated and analyzed transcriptional profiles of four Saccharomyces yeasts. The results revealed pathways 

24 with signatures of constrained and accelerated regulatory evolution in individual yeasts, and across the 

25 phylogcny, highlighting the prevalence of pathway-level expression change during the divergence of yeast 

26 species. We anticipate that our pathway-based phylogenetic approach will be of broad utility in the 

27 search to understand the evolutionary relevance of regulatory change. 



28 Author Suminary 

29 Comparative transcriptomic studies routinely identify thousands of genes differentially expressed between 

30 species. The central question in the field is whether and how such regulatory changes have been the 

31 product of natural selection. Can we detect the signal of evolutionarily relevant expression divergence 

32 amid the noise of changes resulting from genetic drift? Our work develops a theory of gene expression 

33 variation among a suite of genes that function together. We derive a formalism that relates empirical 

34 observations of expression of pathway genes in divergent species to the underlying strength of natural 

35 selection on expression output. We show that fitting this type of model to simulated data accurately 

36 recapitulates the parameters used to generate the simulation. Wc then make experimental measurements 

37 of gene expression in a panel of single-celled eukaryotic yeast species. To these data we apply our 

38 inference method, and identify pathways with striking evidence for accelerated or constrained regulatory 

39 evolution, in particular species and across the phylogeny Our method provides a key advance over 

40 previous approaches in that it maximizes the power of rigorous molecular-evolution analysis of regulatory 

41 variation. As such, the theory and tools we have developed will likely find broad application in the field 

42 of comparative genomics. 

43 Introduction 

44 Comparative studies of gene expression across species routinely detect regulatory variation at thousands 

45 of loci [IHl]- Whether and how these expression changes are of evolutionary relevance has become 

46 a central question in the field. In landmark cases, experimental dissection of model phenotypes has 

47 revealed evidence for adaptive regulatory change at individual genes [SHHl ■ These findings have motivated 

48 hypothesis-generating, genome-scale searches for signatures of natural selection on gene regulation. In 

49 addition to molecular-evolution analyses of regulatory sequence [5Hl2| , phylogenetic methods have been 

50 developed to infer evidence for non-neutral evolutionary change from measurements of gene expression 

51 [4l[T3l[T4]. Two classic models of continuous character evolution have been used for the latter purpose: 

52 Brownian motion models, which can specify lineage-specific rates of evolution on a phylogenetic tree 

53 [T5l - [T8] and have been used to model the neutral evolution of gene expression [T41IT9] , and the Ornstein- 

54 Uhlcnbeck model, which by describing lineage-specific forces of drift and stabilizing selection [T5ll20l21| can 

55 be used to test for evolutionary constraint on gene expression [4l[T4]. To date, phylogenetic approaches 



56 have had relatively modest power to infer lineage-specific rates or selective optima of gene expression 

57 levels. This limitation is due in part to the sparse species coverage typical of transcriptomic surveys, 
5s in contrast to studies of organismal traits where observations in hundreds of species can be made to 

59 maximize the power of phylogenetic inference |22H24j . 

60 As a complement to model-based phylogenetic methods, more empirical approaches have also been 

61 proposed that detect expression patterns suggestive of non- neutral evolution [25H27| . We previously 

62 developed a paradigm to detect species changes in selective pressure on the regulation of a pathway, 

63 or suite of genes of common function, in the case where multiple independent variants drive expression 

64 of pathway genes in the same direction [26l[28]. Broadly, pathway- level analyses have the potential to 

65 uncover evidence for changes in selective pressure on a gene group in the aggregate, when the signal at 

66 any one gene may be too weak to emerge from genome-scale scans. However, the currently available tests 

67 for directional regulatory evolution are not well suited to cases in which some components of a pathway 

68 are activated, and others are downregulated, in response to selection. 

69 In this work, we set out to combine the rigor of phylogenetic methods to reconstruct histories of 

70 continuous-character evolution with the power of pathway-level analyses of regulatory change. We rea- 

71 soned that an integration of these two families of methods could be used to detect cases of pathway 

72 regulatory evolution from gene expression data, without assuming a polygenic or directional model. To 

73 this end. we aimed to develop a phylogenetic model of pathway regulatory change that accounted for 

74 differences in evolutionary rate between the individual genes of a pathway. We sought to use this model 

75 to uncover gene groups whose regulation has undergone accelerated evolution or been subject to evolu- 

76 tionary constraint, over and above the degree expected by drift during species divergence as estimated 

77 from genome sequence. As an experimental testbed for our inference strategy, we used the Saccharomyces 

78 yeasts. These microbial eukaryotes span an estimated 20 million years of divergence and have available 

79 well-established orthologous gene calls [35], and yeast pathways are well-annotated based on decades of 

80 characterization of the model organism S. cerevisiae. We generated a comparative transcriptomic data 

81 set across Saccharomycetes by RNA-seq, and we used the data to search for cases of pathway regulatory 

82 change. 



Results 

Modeling the rates of regulatory evolution across the genes of a pathway 

The Brownian motion model of expression of a gene predicts a multivariate normal distribution of observed 
expression levels in the species at the tips of a phylogenetic tree. The variance-covariance matrix of this 
multivariate normal distribution reflects both the relatedness of the species and the rate of regulatory 
evolution along each branch of the tree. We sought to apply this model to interpret expression changes 
in a pre-defined set of genes of common function, which we term a pathway. Our goal was to test 
for accelerated or constrained regulatory variation in a pathway relative to the expectation from DNA 
sequence divergence, as specified by a genome tree. To avoid the potential for over-parameterization 
if the rate of each gene in a pathway were fit separately, we instead developed a formalism, detailed 
in Methods, to model regulatory evolution using a parametric distribution of evolutionary rates across 
the genes. This strategy parallels well-established models of the rate of DNA sequence evolution across 
different sites in a locus or genome j30j . Briefiy, we assumed that each gene draws its rate of evolution in 
the Brownian motion model from an inverse-gamma distribution, and we derived the relationship between 
the parameters of this distribution and the likelihood of expression observations at the tips of the tree. 
This formalism enabled a maximum-likelihood fit of the distribution parameters given empirical expression 
data, and could accommodate models of lineage-specific regulatory evolution, in which a particular subtree 
was described by distinct evolutionary rate parameters relative to the rest of the phylogeny. As a point 
of comparison, we additionally made use of an Ornstein-Uhlenbeck (OU) model [H]: here the rate of 
regulatory evolution of each gene in a pathway, across the entire phylogeny, was drawn from an inverse- 
gamma distribution, and all genes of the pathway were subject to the same degree of stabilizing selection, 
again across the entire tree. 

Our ultimate application of the method was to enumerate all possible Brownian motion models in 
which pathway expression evolved at a distinct rate along the lineages of a subtree relative to the rest 
of the phylogeny, and for each such model, apply our fitting strategy and tabulate the likelihood of 
the data under the best- fit parameter set. To compare these likelihoods and the analogous likelihood 
from the best-fit OU model of universal constraint, we applied a standard Akaike information criterion 
(AIC) p51f?Tll52] to identify strongly supported models. 



HI Simulation testing of inference of pathway regulatory evolution 

112 As an initial test of our approach, we sought to assess the performance of our phylogenetic inference 

113 scheme in the ideal case in which rates of regulatory evolution of the genes of a pathway were simulated 

114 from, and thus conformed to, the models of our theoretical treatment. In keeping with our experimental 

115 application below which used a comparison of Saccharomyces yeast species as a testbed, we developed 

116 a simulation scheme using a molecular clock-calibrated Saccharomyces phylogeny |29| (see Figures 1, 2, 

117 and 5). We first simulated the expression of a single gene subject to accelerated or constrained evolution 
us in a subtree of the phylogeny. As expected, fitting a single-gene Brownian motion or OU model to 

119 these simulated data did not achieve high power or recapitulate model parameters (Figures 1-3, leftmost 

120 data points of each panel), reflecting the challenges of the phylogenetic approach when applied on a 

121 gene- by-gene basis to small trees like the Saccharomyces species set. 

122 We next simulated multi-gene pathways in which the rates of regulatory evolution across the genes 

123 were drawn from an inverse-gamma distribution, with the parameters of this distribution specified to 

124 be distinct for the lineages of a given subtree relative to the rest of the phylogeny. We also simulated 

125 pathways from our OU model of a single, phylogcny-wide inverse-gamma distribution for evolutionary 

126 rates across genes of a pathway and a single, phylogeny-wide degree of constraint. And we simulated 

127 from an equal-rates model in which the rate of regulatory evolution along all branches of the tree in 

128 each gene of a pathway was drawn from the same inverse-gamma distribution. With the simulated 

129 expression data in hand from a given generating model, we fit an OU model, an equal-rates model, and 

130 models of evolutionary rate shifts in each subtree in turn. Comparing AIC weights of the likelihoods, 

131 we observed strong AIC support for the true generating model in cases of lineage-specific regulatory 

132 evolution, approaching AIC weights of 100% for the correct model in large pathways (Figure 1). In 

133 these simulations our method also inferred the correct magnitudes of lineage-specific shifts, for all but 

134 the smallest pathways (Figure 2). Likewise, when applied to simulated expression data generated under 

135 models of phylogcny-wide constraint, our method successfully identified OU as the correct model (Figure 

136 3a) , though with biased estimates of the magnitude of the constraint parameter (Figure 3b) likely due 

137 to a lack of identifiability with the inverse gamma rate parameter (Supplementary Figure 1). For both 

138 classes of model, the performance of our inference method was similar across a range of parameter values 

139 (data not shown). We conclude that our pathway-based phylogenetic approach is highly powered to infer 

140 evolutionary histories of gene expression change, particularly lineage-specific evolutionary rate shifts. As 
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141 a contrast to the poor performance of phylogenctic inference when appHcd to single genes, our results 

142 underscore the utility of the multi-gene paradigm in identifying candidate cases of non- neutral regulatory 

143 evolution. 

144 Phylogenetic inference of gene expression evolution in Saccharomyces yeasts 

145 We next set out to apply our method for evolutionary reconstruction of regulatory change to experimental 

146 measurements of gene expression. The total difference in gene expression between any two species is a 

147 consequence of heritable differences that act in cis on the DNA strand of a gene whose expression is 

148 measured, and of variants that act in trans, through a soluble factor, to impact gene expression of 

149 distal targets. Effects of cis-acting variation can be surveyed on a genomic scale using our previously 

150 reported strategy of mapping of RNA-scq reads to the individual alleles of a given gene in a diploid 

151 inter-specific hybrid |26j , whereas the joint effects of cis and trans-acting factors can be assessed with 

152 standard transcriptional profiling approaches in cultures of purebred species. To apply these experimental 

153 paradigms we chose a system of Saccharomyces sensu stricto yeasts. We cultured two biological replicates 

154 for each of a series of hybrids formed by the mating of S. cerevisiae to S. paradoxus, S. mikatae, and S. 

155 bayanus in turn, as well as homozygotcs of each species. We measured total expression in the species 

156 homozygotcs, and allele-specific expression in the hybrids, of each gene by RNA-seq, using established 

157 mapping and normalization procedures (see Methods) . In each set of expression data, we made use of S. 

158 cerevisiae as a reference: we normalized expression in the homozygote of a given species, and expression of 

159 the allele of a given species in a diploid hybrid, relative to the analogous measurement from S. cerevisiae. 

160 To search for evidence of pathway regulatory evolution in our yeast expression data, we considered 

161 as pathways the pre-defined sets of genes of common function from the Gene Ontology (GO) process 

162 categories. For the genes of each GO term, we used normalized expression measurements in yeast species 

163 and, separately, measurements of cis-regulatory variation from interspecific hybrids, as input into our 

164 phylogenetic analysis pipeline. Thus, for each of the two classes of expression measurements, for a given 

165 GO term we fit models of lineage-specific regulatory evolution incorporating inverse-gamma-distributed 

166 rates across genes; an analogous model with no lineage-specific divergence; and an OU model of universal 

167 constraint. The results revealed a range of inferred evolutionary models and AIC support across GO 

168 terms (Figure 4, Table 1, and Table S2, Table S3), and this complete data set served as the basis for 

169 manual inspection of biologically interesting features. 
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A first emergent trend was the prevalence, across many GO terms, of models of distinct regulatory 
evolution in the lineage to 5*. paradoxus as the best fit to expression measurements in species homozygotes 
(Figure 4a). We noted no such recurrent model in analyses of cis-regulatory variation (Figure 4b), 
implicating trans-acting variants as the likely source of the regulatory divergence in 5*. paradoxus. To 
validate these patterns, we applied our phylogenetie inference method to expression measurements from 
all genes in the genome analyzed as a single group, rather than each GO term in turn. When we used 
expression data from species homozygotes as input for this genome-scale analysis, our method assigned 
complete AIC support to a model in which the rate of evolution was 2.5 times faster on the branch 
leading to S. paradoxus (AIC weight = 1), consistent with results from individual GO terms (Figure 
4). An analogous inference calculation using measurements of cis-regulatory variation, for all genes in 
the genome, yielded essentially complete support for an OU model of universal constraint (AIC weight 
~ .99). Thus, constraint on the czs-acting determinants of gene expression, of roughly the same degree in 
all yeasts, is the general rule from which changes in selective pressure on particular functions may drive 
deviations in individual pathways. However, for many genes, expression in the S. paradoxus homozygote 
is distinct from that of other yeasts out of proportion to its sequence divergence, suggestive of derived, 
trans-acting regulatory variants with pleotropic effects. 

Among the inferences of pathway regulatory evolution from our method, we observed many cases of 
evolutionary interest whose best-fitting model had strong AIC support (Figure 4). For each of 15 GO 
terms, cis-regulatory expression variation measurements yielded inference of an evolutionary model with 
>80% AIC weight (Table 1). Many such GO terms represented candidate cases of polygenic regulatory 
evolution, in which multiple independent variants, at the unlinked genes that make up a pathway, have 
been maintained in some yeast species in response to a lineage-specific shift in selective pressure on ex- 
pression of the pathway components. For example, in replicative cell aging genes (GO term 0001302), 
czs-regulatory variation measured in interspecific hybrids supported a model of polygenic, accelerated 
evolution in S. paradoxus (Figure 5a), with some pathway components upregulated and some down- 
regulated in the latter species relative to other yeasts. The total expression levels of cell aging genes 
in species homozygotes were also consistent with rapid evolution in S. paradoxus (Figure 5a), arguing 
against a model of compensation between cis- and trans-acting regulatory variation, and highlighting 
this pathway as a particularly compelling potential ease of a lineage-specific change in selective pressure. 

In other instances, expression measurements in species homozygotes alone supported models of lineage- 



200 specific evolution, with each such pathway representing a candidate case of non-neutral evolution at 

201 trans-acting regulatory factors. For a total of 41 GO terms, our method inferred models with >80% AIC 

202 weight from homozygote species expression data (Table 2). In these top-scoring pathways, apart from 

203 the genome-scale trend of accelerated evolution in S. paradoxus homozygotes (Figure 4), we also noted 

204 other lineage-specific patterns. These included a gene set annotated in transcription (GO term 0006351), 

205 whose expression levels in S. bayanus were less volatile than those of other yeasts and thus supported a 

206 model of lineage-specific constraint (Figure 5b). The set of top-scoring pathways from analyses of species 

207 homozygote expression data also contined some conforming to the OU model of universal constraint, 

208 such as a set of genes annotated in transport (GO term 0006281), whose expression varied less across all 

209 species than would be expected from the genome tree (Figure 5c). Taken together, our findings indicate 

210 that evolutionary histories can be inferred with high confidence from experimental measurements of 

211 pathway gene expression. In our yeast data, many pathways exhibit expression signatures consistent 

212 with non-neutral regulatory evolution in particular lineages and across the phylogeny. 

213 Discussion 

214 The effort to infer evolutionary histories of gene expression change has been a central focus of modern 

215 comparative genomics. Against a backdrop of a few landmark successes [Hm], progress in the field has 

216 largely been limited by the relatively weak power of phylogcnetic methods when applied, on a gene-by- 

217 gene basis, to measurements from small sets of species. In this work, we have met this challenge with 

218 a method to infer evolutionary rates of any suite of independently measured continuous characters that 

219 can be analyzed together across species. We have derived the mathematical formalism for this model, 

220 and we have illustrated the power and accuracy of our approach in simulations. We have generated 

221 yeast transcriptional profiles that complement available data sets J331I34J by measuring czs-regulatory 

222 contributions to species expression differences as well as the total variation between species. With these 

223 data, we have demonstrated that our phylogcnetic inference method yields robust, interpretable candidate 

224 cases of pathway regulatory evolution from experimental measurements. 

225 The defining feature of our phylogcnetic inference method is that it gains power by jointly leveraging 

226 expression measurements of a group of genes, while avoiding a high-dimensional evolutionary model. 

227 Instead of requiring an estimate of the evolutionary rate at each gene, our strategy estimates the parani- 
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228 eters of a distribution of evolutionary rates across genes. Our approach thus models expression of the 

229 individual genes of a pathway as independent draws from the same distribution. Theories of composite 

230 likelihood make clear that although this independence assumption could upwardly bias the likelihoods 

231 of our best-fit models, model choice and parameter estimates will still be correct on average [35]. We 

232 note that our pathway-level approach is not contingent on the Gaussian models of regulatory evolution 

233 we have used here, and future work will evaluate the advantages of incorporating compound Poisson 

234 process [T3l|36] or more general Levy process [37] models of gene expression. 

235 Our pathway analysis of regulatory variation used gene sets curated through Gene Ontology, but our 

236 method can easily be applied to gene modules defined on the basis of protein or genetic interactions or 

237 coexpression. Any such module is likely to contain both activators and repressors, or other classes of gene 

238 function whose expression may be quantitatively tuned in response to selection by alleles with effects of 

239 opposite sign [351[31]. The phylogenetic approach we have developed here is well-suited to detect these 

240 non-directional regulatory patterns, rather than relying on the coherence of up- or down-regulation of 

241 pathway genes [2BH25BDH12] . Ultimately, a given case of strong signal in our pathway evolution paradigm, 

242 when the best-fit model is one of lineage-specific accelerated regulatory evolution, can be explained either 

243 as a product of relaxed purifying selection or positive selection on pathway output. Our approach thus 

244 serves as a powerful strategy to identify candidates for population-genetic [52] and empirical [CTH5] tests 

245 of the adaptive importance of pathway regulatory change. We have developed an R package, IRS (Inverse 

246 gamma Rate Shifts) , to facilitate the usage of our method. 

247 The advent of RNA-seq has enabled expression surveys across non-model species in many taxa. Max- 

248 imizing the biological value of these data requires methods that evaluate expression variation in the 

249 context of sequence divergence between species. As rigorous phylogenetic interpretation of expression 

250 data becomes possible, these measurements will take their place beside genome sequences as a rich source 

251 of hypotheses, in the search for the molecular basis of evolutionary novelty. 



Methods 



Basic model 



254 To develop Brownian motion and Ornstein-Uhlenbeck models of pathway gene expression, we begin with 

255 the basic established forms of these models. Throughout, we use uppercase letters to represent random 
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256 variables and matrices and lowercase letters to represent nonrandom variables. Assume that we have 

257 measured expression of the genes of a pathway in n species, and that we have a fixed, time-calibrated 

258 phylogeny describing the relationships between those species. We let X,j = {Xii,Xi2, ■ ■ ■ ,Xin) be the 

259 observations of the expression level of the ith gene of the pathway, in each of n species. Both the Brownian 

260 motion and Ornstein-Uhlenbeck (OU) models predict that the vector X.^ is a draw from a multivariate 

261 normal distribution with density 

r 2 trx 1 iTr(x-u)'V"^(x-u) /-,\ 

^^"'^'"'^^= V(2^.Tdet(V) ^ '' ^'^ 

262 where /x is a vector representing the mean expression value at the tips of the phylogenetic tree, V 

263 is a covariance matrix that is specific to the process, and cr^ describes the rate of evolution such that 

264 cr'^Vij = Cov{Xi,Xj) where Vi,j is the i,jth element of V. 

265 If we assume that there is no branch-specific directionality to evolution, we can avoid the need to 

266 estimate fi in either the Brownian motion model or the OU model by a renormalization of the data. We 

267 first arbitrarily choose the gene expression measurements in a single species (say species 1), and define 

268 the new random vector Zi ~ [Zij,^ ^i,3, • ■ • , ^i.n) by 

269 By our assumption that there is no branch-specific directionality, E(Xjj) = E(Xi.i) so E(Zjj) = 

270 for all i and j. Because each X^ is multivariate normally distributed with dimension ri, each Z^ will also 

271 be multivariate normally distributed with dimension n — \ and a slightly different covariance structure. 

272 Letting W be the covariance matrix corresponding to the Z^ . elementary calculations taking into account 

273 variances and covarianees of sums of random variables reveal that 

\v,,, + Vi.i-2V,.i ifi=j 

274 Next, we wish to incorporate into the Brownian motion and OU models a scheme in which the rates 

275 of evolution of the genes of a pathway are not specified independently but instead are drawn from an 
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276 invcrsc-gamnia distribution. The invcrsc-ganiina distribution has density 

%) = f|^2/^("+'^e-f, (2) 

r(a) 

277 where r(-) is the gamma function and a and (3 are shape and scale parameters. The moments of this 

278 distribution are 

E{Y) ^ 



a-1 

and 

Var(r) ^ 



(a-l)2(a-2)' 

280 from which it follows that the inverse-gamma distribution has no mean if a < 1 and no variance if 

281 a < 2. These properties allow for the distribution of rates of gene expression evolution in a pathway to 

282 be relatively broad; on the other hand, the inverse gamma density has no mass at 0, which prevents any 

283 gene in a pathway from not evolving at all. In addition, as a — >■ oo and /3 ^ oo as — 3j = /i stays fixed, 

284 the distribution converges to a point mass at /i. Thus, a model where there is one rate for every gene is 

285 nested within the inverse-gamma distributed rates model. 

286 Computation of the density of the vector of expression measurements Zi under this model is simpli- 

287 ficd by the fact that the inverse-gamma distribution is the conjugate prior to the variance of a normal 

288 distribution. Hence, we see that the density of Z is 



/(z) = r g{z;a',W)h{a')d{a' 
Jo 



— ^e 2^ — r^ c ) ^ ^e ^rf cr I 

v/(27r(72)"-idct(W) r(a)^ 

1 /3" r(a + (n-l)/2) 



(3) 



V(2^)"-i dct(W) (iz'W-iz + /3)"+("^''/' r(a) 

289 Thus, the likelihood of the observations of transcriptome-wide gene expression across the pathway in 
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290 n taxa. normalized by the expression level in taxon 1, is 

r. g.TJ 1 ^ r(a + (n-l)/2) 

^ ' V V(2'r)"-idet(W) (i(z',W-iz, + /3)"+("-i)/2 r(a) ' ^^ 

291 For the application to simulated and experimental data as described below, given observations of gene 

292 expression of the species at the tips of the tree, and a model that specifies the covariance matrix V, we 

293 optimized the log likelihood function using the L-BFGS-B optimization routine in R [44] . 

294 Covariance matrix 

295 In the previous section, we left the unnormalized covariance matrix V unspecified. Here we briefiy recall 

296 the forms of V under Brownian motion and the Ornstein-Uhlenbeck process. Define the height of the 

297 evolutionary tree to be T and and the height of the node containing the common ancestor of taxa i and 

298 j by Uj. Then the covariance matrix for Brownian motion is 

299 and the covariance matrix for the Ornstein-Uhlenbeck process is 

Vi ■ 

where 9 quantifies the strength of stabilizing selection (large 6 corresponds to stronger selection). 

To model lineage-specific shifts in the evolutionary rate of gene expression in the context of the 
Brownian motion model, we adopt a framework similar to that of O'meara et al. |17] . We assume that in 
a specified subtree of the total phylogeny, the rate of evolution of every gene is multiplied by a constant, 
compared to the rest of the tree. Under the Brownian motion model, this is equivalent to multiplying 
the branch lengths in that part of the tree by that same constant; hence, shifts in evolutionary rate are 
incorporated by multiplying the appropriate elements of W by the value of the rate shift. 
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307 Comparing likelihoods among fitted models 

308 To evaluate the support for the distinct models we fit to expression data for a given pathway, we require 

309 a strategy that will be broadly applicable in cases where no a priori expectation of the correct model is 

310 available, such that nested hypothesis testing schemes fl7| are not applicable. Instead, given likelihoods 

311 L from fitting of each model in turn to expression data from the genes of a pathway, we use the Akaike 

312 Information Criterion, 2fc — 2ln{L) [45j . to report the strength of the support for each, where k is the 

313 number of parameters in the model (fc = 2 for the Brownian motion model in which the rate of evolution 

314 is the same along all lineages in the phylogeny, and k = 3 for all other models). 

315 Simulations 

316 For all simulations, we used a phylogenetic tree adapted from Scannell et al. by removing the branch 

317 leading to Saccharomyces kudriavzevii (sec insets of Figures 1, 2 and 5). To perform simulations, we 

318 generated expression data for one gene at a time as follows. We first drew the rate of evolution from the 

319 appropriately parameterized inverse-gamma distribution. Then, without loss of generality, we specified 

320 that the expression level at the root of the phylogeny was equal to 0, and we simulated evolution along 

321 the branches of the Scannell et al. tree according to either a Brownian motion or an Ornstcin Uhlenbeck 

322 process (with optimal expression level equal to 0), using the terminal expression level on a branch as the 

323 initial expression level of its daughter branches. To account for lineage-specific shifts in evolutionary rate 

324 in a simulated pathway, we multiplied the rate of evolution of each gene by the rate shift parameter for 

325 evolution along the branches affected by the rate shift. For each Brownian motion-based rate shift model 

326 applicable to the tree (see insets of Figure 1 and 2), we simulated 100 replicate datasets for each of a 

327 range of gene group sizes, in each case setting a = 3, /3 = 2, and a rate shift parameter equal to 5. For 

328 the Ornstein-Uhlcnbeck model, we simulated 100 replicate datasets for each of a range of pathway sizes 

329 with a = 3, /? = 2, and 9 = 10. 

330 Yeast strains, growth conditions, and RNA-seq 

331 Strains used in this study are listed in Table SI. For pairwise comparisons of S. cerevisiae and each 

332 of S. paradoxus, S. mikatae, and 5". bayanus, two biological replicates of each diploid parent species 

333 and each interspecific hybrid were grown at 25°C in YPD medium [4^ to log phase (between 0.65-0.75 
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334 OD at 600 nm). Total RNA was isolated by the hot acid phenol method [IS] and treated with Turbo 

335 DNA-free (Ambion) aceording to the manufacturer's instructions. Libraries for a strand-specific RNA- 

336 seq protocol on the lUumina sequencing platform, which delineates transcript boundaries by sequencing 

337 poly-adenylated transcript ends, were generated as in |47| with the following modifications: 1) AmpureXP 

338 beads (Beckman) were used to clean up enzymatic reactions; 2) the gel purification and size-selection step 

339 was eliminated; 3) the oligo-dT primer used for cDNA synthesis was phosphorothioated at position ten 

340 (TTTTTTTTTT*TTTTTTTTTTVN, V=A,C,G, N=A,C,G,T, *=phosphorothioate linkage. Integrated 

341 DNA Technologies); and 4) 12 PCR cycles were performed. Libraries were sequenced using 36 bp paired- 

342 end modules on an lUumina IIx Genome Analyzer (Elini Biopharmaceuticals) . 

343 RNA-seq mapping and normalization 

344 Bioinformatic analyses were conducted in Python and R. RNA-seq reads were stripped of their putative 

345 poly- A tails by removing stretches of consecutive Ts fianking the sequenced fragment; reads without at 

346 least two such Ts were discarded, as were reads with Ts at both ends. To ensure that expression data from 

347 hybrid diploids and purebred species could be compared, for each class of expression measurement for a 



348 given pair of species we mapped reads to both species genomes from http://www.saccharomycessensustricto.org 

349 [29j using Bowtie [48] with default settings and fiags -ml -XIOOO. These settings allowed us to retain 

350 only those reads that were unambiguously assigned to one of the two species in each pairwise compar- 

351 ison. A mapped read was inferred to have originated from the plus strand of the genome if its poly-A 

352 tail corresponded to a stretch of As at the 3' end of the fragment, and a read was assigned to the minus 

353 strand if its poly-A tail corresponded to a stretch of Ts at the 5' end of the fragment relative to the 

354 reference genome. To filter out cases in which inferred poly-A tails originated from stretches of As or 

355 Ts encoded endogenously in the genome, we eliminated from analysis all reads whose stretch of As or 

356 Ts contained more than 50% matches to the reference genome. In order to filter out cases of potential 

357 oligo-dT mispriming during cDNA synthesis, we also eliminated from analysis all reads that contained 

358 10 or more As in the 20 nucleotides upstream of their transcription termination site. 

359 We controlled for read abundance biases due to differing GC content as follows. For each lane of 

360 sequencing, we grouped sets of overlapping reads and normalized abundance according to GC content 

361 of the overlapping region using fuU-quantilc normalization as implemented in the package EDASeq |49| . 

362 Normalized abundance was divided by raw abundance to generate a weight that was assigned to every 



365 
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read in the group. These weights were used in place of raw read counts in aU downstream analyses. 
All expression data are available through the Gene Expression Omnibus under identification number 
GSE38875. 



Transcript annotation 



367 Coordinates of orthologous open reading frames (ORFs) in each genome were taken from http://www.saccharomycessensusi 

368 These ORF boundaries in S. cerevisiae differed, in some cases, from ORF definitions in the Saccharomyces 

369 Genome Database [SOI SGD, using the definitions from December 22, 2007]; genes for which the two sets 

370 of definitions did not overlap were discarded. For cases where the definitions overlapped but differed by 

371 more than ten base pairs at either end, we used the boundaries defined by SGD and adjusted ortholog 

372 boundaries in other species accordingly after performing local multiple alignment [51) of the orthologous 

373 regions and fianking sequences as defined by j29) . 

374 For most genomic loci, each sense transcript feature was defined as the region from 50 bp upstream 

375 to 500 bp downstream of its respective ORF. If sequence within this window for a given target ORF 

376 overlapped with the boundaries of an adjacent gene or known non-coding RNA on the same strand, the 

377 sense feature boundaries of the target were trimmed to eliminate the overlap. For tandem gene pairs, 

378 the 3' boundary of the upstream gene sense feature was set to 500 bp past the coding stop or the coding 

379 start of the downstream gene sense feature, whichever was closer; the 5' boundary of the downstream 

380 gene sense feature was set to 50 bp upstream of its coding start or the 3' end of the upstream gene sense 

381 feature, whichever was closer. 

382 We tabulated the GC-normalized expression counts (see above) that mapped to each transcript feature 

383 for each RNA-seq sample. Given the full set of such counts across all features and all samples, we 

384 then applied the upper-quartile between- lane normalization method implemented in EDASeq |49| . The 

385 normalized counts from this latter step for a given species were averaged across all biological replicates 

386 to yield a final expression level for the feature, used in all analysis in this work. 

387 Yeast pathways 

388 We downloaded the list of genes associated with each Gene Ontology process term from the Saccharomyces 

389 Genome Database and filtered for terms containing at least 10 genes. The resulting set comprised 333 

390 terms. 
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391 Visualizing distributions of interspecific expression variation 

392 For visual inspection of expression differences between species in Figure 5, we normalized experimentally 

393 measured data by branch length as follows. If expression evolution follows the same Gaussian-based 

394 model on all lineages of the yeast phylogeny, when the expression level of gene j in taxon i is compared to 

395 that in taxon 1 used as a reference, the marginal distribution Zi_j (the difference in expression between 

396 taxon i and taxon 1 at gene j) is distributed according to a univariate analog of equation ^. In this 

397 case, dividing Zi^j by the absolute branch length between taxon i and taxon 1 eliminates the dependence 
39S of the distribution on the total divergence time between taxa, and the density of this normalized quantity 

399 will be the same for all species comparisons. In the case of lineage-specific shifts in evolutionary rate or 

400 universal selective constraint, one or more taxa will exhibit distinct densities of the normalized expression 

401 divergence measure. Thus, we generated each distribution in Figure 5 by tabulating the log fold-change 

402 in expression between the indicated species and S. cerevisiae, and then dividing this quantity by the 

403 divergence time between the indicated species and S. cerevisiae according to the genome tree. 
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Figure Legends 
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Figure 1. Phylogenetic inference of the correct model of pathway regulatory evolution 
from data simulated under Brownian motion models. Each panel shows results of analyses of 
pathway gene expression data simulated under one Brownian motion-based evolutionary model, with 
dark lines in each inset cartoon indicating lineages subject to accelerated regulatory evolution in the 
respective simulation. In a given panel, each trace reports results from maximum-likelihood fitting of 
the indicated model (legend at right) to simulated data from pathways of varying size; the x axis 
reports the number of genes in a pathway and the y axis reports the Akaike weight of the indicated 
model. Each data point represents results of simulations of the indicated pathway size in which the 
initial evolutionary rate of each pathway gene was drawn from an inverse-gamma distribution with 
a = 3, /3 = 2 and then increased by a factor of 5 for lineages shown in dark grey. In the legend, ER 
denotes a Brownian-motion model of equal evolutionary rates on all branches of the phylogeny, OU 
denotes an Ornstein-Uhlenbeck model of constraint on all branches, and species names denote 
Brownian-motion models of increased evolutionary rate on the subtrees leading to the respective taxa. 
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Figure 2. Phylogenetic inference of the correct degree of accelerated regulatory evolution 
from data simulated under Brownian motion models. Each panel shows results of analyses of 
pathway gene expression data simulated under one Brownian motion-based evolutionary model. 
Simulations and fitting, and the reporting of results, are as in Figure 1, except that in a given panel, the 
y axis reports the shift in the rate of regulatory evolution on the indicated lineage, relative to that on 
the rest of the phylogeny. The black trace reports maximum-likelihood fitted values from analysis of 
simulated data and the dashed line indicates the true value in the generating model of the simulation. 
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Figure 3. Phylogenetic inference of the evolutionary history of pathway regulation from 
data simulated under an Ornstein-Uhlenbeck model. Simulations were performed as in Figure 1 
except that expression data were simulated under an Ornstein-Uhlenbeck model of universal constraint 
across the yeast phylogeny. (a), The x axis reports pathway size and the y axis reports Akaikc weights 
of fitted models as in Figure 1. (b), The x axis reports pathway size and the y axis reports the values of 
the Ornstein-Uhlenbeck constraint parameter 9. The black trace reports maximum-likelihood fitted 
values from analysis of simulated data and the dashed line indicates the true value in the generating 
model of the simulation. 
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Figure 4. Inference of regulatory evolution in yeast pathways. Each panel reports results of 
phylogenetic inference of evolutionary histories of gene expression change from one set of experimental 
transcriptional profiling data. In a given panel, each vertical bar reports results of maximum-likelihood 
fits of Brownian-motion and Ornstein-Uhlcnbeck models to expression of the genes of one Gene 
Ontology process term; the total proportion of a bar corresponding to a particular color indicates the 
Akaike weight of the corresponding model (legend at right, with labels as in Figure 1). (a). Inference 
from total expression measurements in homozygote species, (b). Inference from measurements of 
czs-regulatory variation in interspecific hybrids. 
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Figure 5. Lineage-specific regulatory evolution and constraint in yeast pathways. Each 
panel shows distributions of experimental gene expression measurements among the genes of one yeast 
Gene Ontology process term whose evolutionary history was inferred with strong support. In a given 
panel, each trace reports the expression levels of the genes of the indicated pathway, from the allele of 
the indicated yeast species in a hybrid or in the purebred homozygote of a species, normalized with 
respect to the analogous measurement in S. cerevisiae and with respect to branch length. Inset 
cartoons represent the model inferred with AIC weight >80% for the indicated pathway (sec Tables 1 
and 2). (a) AUele-specific expression from measurements in diploid hybrids (left) and total expression 
measurements in species homozygotes (right) for the genes of GO:0001302, replicative cell aging, 
supporting a model of accelerated evolution in S. paradoxus; in the inset, the number above the bolded 
branch reports the inferred shift in the rate of regulatory evolution along that lineage, (b) Allele-specific 
expression from measurements in diploid hybrids for the genes of GO:0006351, transport, supporting a 
model of constraint in S. bay anus; in the inset, the number above the bolded branch reports the 
inferred shift in the rate of regulatory evolution along that lineage, (c) Total expression measured in 
species homozygotes for the genes of GO:0006281, transcription, supporting an Ornstein-Uhlenbeck 
model of universal constraint; in the inset, the number above the tree reports the inferred value of the 
constraint parameter. Note that in (c), results arc consistent with the effect of selective constraint 
driving expression measurements in each taxon to revert to a universal mean, such that normalizing 
expression measurements by total evolutionary time overcorrects the divergence of long branches. 
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Table 1. Top-scoring fitted models of cis-regulatory evolution in yeast pathways. 

Constraint or shift parameter 
49.97745883 
0.230918849 
0.258701476 
3.197059161 
4.292287639 
0.037806902 
3.733770466 
3.079387696 
11.43989834 
9.314688245 
0.188381056 
0.043114988 
0.005082617 
0.040060263 
4.060128813 



GO term 


Model 


wAIC 


34599 


Ornstein-Uhlenbeck 


0.899405768 


6355 


S. hayanus shift 


0.837382338 


6351 


S. bayanus shift 


0.849912647 


1302 


S. paradoxus shift 


0.859866949 


6897 


S. paradoxus shift 


0.965743399 


6338 


S. cerevisiae shift 


0.840339574 


42254 


Ornstein-Uhlenbeck 


0.924785133 


6364 


Ornstein-Uhlenbeck 


0.902358815 


44255 


S. paradoxus shift 


0.945799302 


54 


S. paradoxus shift 


0.91523272 


16310 


S. bayanus shift 


0.902247359 


8152 


S. bayanus shift 


0.844716856 


6629 


S. bayanus shift 


0.91650274 


122 


S. bayanus shift 


0.819216472 


30437 


S. paradoxus shift 


0.931136455 



Each row reports the results of phylogenetic inference of the evolutionary history of gene regulation for 
one yeast Gene Ontology process term, from experimental measurements of cis-regulatory variation in 
interspecific yeast hybrids. Model, best-fit model from among the five possible Brownian motion models 
of evolutionary rate shift in lineages of the Saccharomyces phylogeny (see Figure 1), the 
Ornstein-Uhlenbeck (OU) model of universal constraint, and the equal-rates model involving no 
lineage-specific differences in evolutionary rate. wAIC, Akaike Information Criterion weight of the 
indicated model. Constraint or shift parameter, fitted value of the strength of purifying selection or the 
shift in the rate of regulatory evolution on the indicated lineage, when the best-fit model was the OU 
model of constraint or a Brownian motion lineage-specific evolutionary rate model, respectively. 
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Table 2. Top-scoring fitted models of species regulatory evolution in yeast pathways. 



GO term 


Model 


wAIC 


6397 


S. paradoxus shift 


0.965171603 


8033 


S. paradoxus shift 


0.969683391 


71038 


S. paradoxus shift 


0.89725301 


480 


S. paradoxus shift 


0.928296518 


42274 


S. paradoxus shift 


0.958076119 


472 


S. paradoxus shift 


0.953733629 


15031 


S. hayanus shift 


0.872939854 


1302 


S. paradoxus shift 


0.999927135 


6006 


S. paradoxus shift 


0.816341854 


6260 


S. paradoxus shift 


0.831407464 


30163 


S. paradoxus shift 


0.82364567 


6897 


S. paradoxus shift 


0.970677101 


6412 


S. paradoxus shift 


0.981277345 


7121 


S. paradoxus shift 


0.998579562 


6914 


Ornstein-Uhlenbeck 


0.810293525 


30488 


5". paradoxus shift 


0.893282646 


42254 


S. paradoxus shift 


0.99999983 


6200 


S. paradoxus shift 


0.81144199 


6468 


S. paradoxus shift 


0.990399439 


16567 


S. paradoxus shift 


0.959694914 


6364 


S. paradoxus shift 


0.999995709 


6754 


S. paradoxus shift 


0.816303046 


422 


Ornstein-Uhlenbeek 


0.877576591 


463 


S. paradoxus and S. cerevisiae shift 


0.958484282 


6414 


S. paradoxus and S. cerevisiae shift 


0.906687775 


19236 


S. paradoxus shift 


0.989881765 


31505 


S. paradoxus shift 


0.955855579 


32259 


S. paradoxus shift 


0.998665437 


6506 


S. paradoxus shift 


0.982054204 


16310 


S. paradoxus shift 


0.99652632 


447 


S. paradoxus shift 


0.994506418 


6281 


Ornstein-Uhlenbeek 


0.882367142 


71042 


S. paradoxus shift 


0.804318406 


6378 


S. cerevisiae shift 


0.845112064 


7165 


S. paradoxus shift 


0.811091269 


6810 


Ornstein-Uhlenbeck 


0.859937275 


6812 


5". paradoxus shift 


0.898839416 


8150 


S. paradoxus shift 


0.999962114 


6417 


5". paradoxus shift 


0.925463092 


6407 


5". paradoxus shift 


0.988260506 


462 


S". paradoxus shift 


0.817627126 



Constraint or shift parameter 



3.028130303 
3.714749932 
6.751073973 
4.460579672 
8.083546161 
4.686648741 
0.183834463 
6.671016575 
4.6555377 
3.043207869 
7.009201233 
4.408614609 
2.770778823 
16.81960721 
41.38598192 
7.945094861 
6.856141937 
5.590943868 
2.655209273 
3.313920599 
5.841035759 
4.668929462 
57.08946364 
10.39289039 
8.121469425 
6.821984459 
3.032267535 
4.546902844 
5.468542886 
2.487101867 
5.252074336 
3.410968446 
6.030946867 
l.OOE-04 
4.465389345 
2.618523967 
4.312524185 
2.871955612 
5.339113187 
8.792447836 
7.291083934 



Data are as in Table 1 except that inferences were made from experimental measurements of expression 
in purebred yeast homozygotes. 
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Supplementary Figure Legends 




Supplementary Figure 1. Inferred values of parameters in simulations under an 
Ornstein-Uhlenbeck model of pathway regulatory evolution. In the main plot, each data point 
reports the results of one simulation of pathway regulatory evolution under an Ornstein-Uhlenbeck 
(OU) model in which the rates of regulatory evolution of pathway genes were drawn from an 
inverse-gamma distribution with a = 3 and /? = 2 and the OU constraint parameter 9 was set to 10, 
after which parameter values for an OU model were optimized against the simulated expression data. 
For histograms at top and left, the independent variable is shared with the axis of the main plot and 
reports the indicated parameter value, and the dependent variable reports the proportion of simulated 
data sets in which the corresponding value was inferred. Note that inferences from most simulated data 
sets accurately estimate /3 and 9, but for a few data sets, large parameter values are inferred. 
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Supplementary Table Legends 

Supplementary Table 1. Strains used in this work. 



Supplementary Table 2. Fitted models of czs-regulatory evolution in yeast pathways. Data 

are as in Table 1 of the main text except that results for all pathways are shown. 



Supplementary Table 3. Fitted models of species regulatory evolution in yeast pathways. 

Data arc as in Tabic 2 of the main text except that results for all pathways arc shown. 



